Ouachita Parish
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Jin, Zhuoran, Yuan, Hongbang, Men, Tianyi, Cao, Pengfei, Chen, Yubo, Liu, Kang, Zhao, Jun
Despite the significant progress made by existing retrieval augmented language models (RALMs) in providing trustworthy responses and grounding in reliable sources, they often overlook effective alignment with human preferences. In the alignment process, reward models (RMs) act as a crucial proxy for human values to guide optimization. However, it remains unclear how to evaluate and select a reliable RM for preference alignment in RALMs. To this end, we propose RAG-RewardBench, the first benchmark for evaluating RMs in RAG settings. First, we design four crucial and challenging RAG-specific scenarios to assess RMs, including multi-hop reasoning, fine-grained citation, appropriate abstain, and conflict robustness. Then, we incorporate 18 RAG subsets, six retrievers, and 24 RALMs to increase the diversity of data sources. Finally, we adopt an LLM-as-a-judge approach to improve preference annotation efficiency and effectiveness, exhibiting a strong correlation with human annotations. Based on the RAG-RewardBench, we conduct a comprehensive evaluation of 45 RMs and uncover their limitations in RAG scenarios. Additionally, we also reveal that existing trained RALMs show almost no improvement in preference alignment, highlighting the need for a shift towards preference-aligned training.We release our benchmark and code publicly at https://huggingface.co/datasets/jinzhuoran/RAG-RewardBench/ for future work.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- North America > United States > Michigan (0.05)
- (29 more...)
How to scale automation in DevOps environments
The telecom giant CenturyLink created a federated program to scale automation efforts, where one dedicated team curated standards and best practices that were then rolled out across the company. CenturyLink -- a facilities-based communications company with 40,000 full-time employees, based out of Monroe, La. -- wanted to achieve automation at scale and blend the best characteristics of application development, scripting, and robotic process automation (RPA) tooling, said Troy Ferrence, senior manager of automation solutions architecture at CenturyLink. For the past several years, Ferrence's team has focused on scripting and application development, but its efforts accelerated when CenturyLink introduced RPA software that used the UiPath platform. CenturyLink expanded its work to more than 19 teams dedicated to this project, referred to as center of excellence teams, in the last year and developed over a thousand different automations. "We have a lot of legacy applications where we don't have any type of interface or availability for enhancement which are perfect candidates for RPA," Ferrence said.
POLITICO Playbook: Robert Mueller's long tail
Additional documents from former special counsel Robert Mueller's report provide a layer of texture to the Russiagate scandal. YOU THOUGHT THE MUELLER REPORT WAS OVER, didn't you? Well, yesterday, BuzzFeed's Jason Leopold -- a level 19 FOIA ninja -- and his colleagues got their hands on detailed summaries of the interviews three Trump aides gave to the FBI, known as "302 reports," along with other documents. And while they don't appreciably change our understanding of the Russiagate scandal, they do add a layer of texture to what we already knew. And even after his firing, he was still in touch with top campaign officials up to Election Day, though campaign'CEO' Steve Bannon warned in an email to Jared Kushner: "We need to avoid this guy like the plague."
- Asia > Russia (0.15)
- Europe > Ukraine (0.14)
- North America > United States > New York (0.05)
- (16 more...)
- Media (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- (3 more...)
Delta, We're Ready When You Are!
That has been the advertising slogan, on and off, for Delta Air Lines since the 1960s. Throw in a We Love to Fly, and It Shows, You'll Love the Way We Fly and a few other customer-focused taglines and you get the idea that Delta, if it makes good on its slogan – or brand promise, as it is often called today – will make its customers very happy. It did, and it was featured in a case study in Tom Peters and Robert H. Waterman, Jr.'s 1982 mega-bestselling book, In Search of Excellence, that featured Delta as one of the most excellent companies in the world. Delta was founded in 1928 in Monroe, Louisiana. Today, with its worldwide alliance partners, it serves more than 300 destinations in more than 50 countries.
- North America > United States > Louisiana > Ouachita Parish > Monroe (0.25)
- North America > United States > Nevada > Clark County > Las Vegas (0.07)
- Transportation > Passenger (1.00)
- Transportation > Air (1.00)
- Consumer Products & Services > Travel (1.00)
Views of AI, robots, and automation based on internet search data
Artificial intelligence, robots, and automation are rising in importance in many areas. As noted in the recent book, "The Future of Work: Robots, AI, and Automation," there are exciting advances in finance, transportation, national defense, smart cities, and health care, among other areas. Businesses are developing solutions that improve the efficiency and effectiveness of their operations and using these tools to improve the way their firms function. Yet there also are concerns about the impact of these developments on jobs and personal privacy. A Pew Research Center national survey revealed considerable unease about emerging trends.
- North America > United States > California > San Francisco County > San Francisco (0.15)
- Asia > China (0.06)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.05)
- (16 more...)
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
- Government > Regional Government (0.70)
Cautionary Tale of a Bionic Man
One night in 1982, John Mumford was working on an avalanche patrol on an icy Colorado mountain pass when the van carrying him and two other men slid off the road and plunged over a cliff. The other guys were able to walk away, but Mumford had broken his neck. The lower half of his body was paralyzed, and though he could bend his arms at the elbows, he could no longer grasp things in his hands. Fifteen years later, however, he received a technological wonder that reactivated his left hand. It was known as the Freehand System. A surgeon placed a sensor on Mumford's right shoulder, implanted a pacemaker-size device known as a stimulator just below the skin on his upper chest, and threaded wires into the muscles of his left arm.
- North America > United States > Colorado (0.24)
- North America > United States > Ohio (0.05)
- North America > United States > Louisiana > Ouachita Parish > Monroe (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Health Care Providers & Services (0.94)
- Government > Military (0.69)
High-tech brings its smarts to buildings
A California start-up called View, which has raised a whopping 500 million from investors including Corning, General Electric and Khosla Ventures, is making high-tech windows that have the potential to bring to buildings what high-resolution touchscreens did for smartphones. View's windows eliminate glare, change hue, moderate internal temperature -- and at some point, could show entirely different views of the outside world -- via a process that uses a pane of glass sprayed with electrochromic material, which alters light transmission. The result is smart glass that increases energy efficiency and promises better worker productivity, via technology accessed through an app. "When you look at smart glass, the only smart surface we saw was on our phones," says Ben Bajarin, an analyst for Creative Strategies who follows the industry. "Now, we believe consumers are moving toward an age where smart glass can do almost anything -- for example, project images of the sun on your windows during a rainy day or viewing data on the window." While elements of the technology have been around on a smaller scale, such as car windows, View is the first company to commercially produce such glass at a large scale.
- North America > United States > Mississippi > De Soto County > Olive Branch (0.07)
- North America > United States > Louisiana > Ouachita Parish > Monroe (0.06)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.06)
- (5 more...)
- Information Technology > Hardware (0.35)
- Information Technology > Communications (0.35)
- Information Technology > Artificial Intelligence (0.30)
CrossCat: A Fully Bayesian Nonparametric Method for Analyzing Heterogeneous, High Dimensional Data
Mansinghka, Vikash, Shafto, Patrick, Jonas, Eric, Petschulat, Cap, Gasner, Max, Tenenbaum, Joshua B.
There is a widespread need for statistical methods that can analyze high-dimensional datasets with- out imposing restrictive or opaque modeling assumptions. This paper describes a domain-general data analysis method called CrossCat. CrossCat infers multiple non-overlapping views of the data, each consisting of a subset of the variables, and uses a separate nonparametric mixture to model each view. CrossCat is based on approximately Bayesian inference in a hierarchical, nonparamet- ric model for data tables. This model consists of a Dirichlet process mixture over the columns of a data table in which each mixture component is itself an independent Dirichlet process mixture over the rows; the inner mixture components are simple parametric models whose form depends on the types of data in the table. CrossCat combines strengths of mixture modeling and Bayesian net- work structure learning. Like mixture modeling, CrossCat can model a broad class of distributions by positing latent variables, and produces representations that can be efficiently conditioned and sampled from for prediction. Like Bayesian networks, CrossCat represents the dependencies and independencies between variables, and thus remains accurate when there are multiple statistical signals. Inference is done via a scalable Gibbs sampling scheme; this paper shows that it works well in practice. This paper also includes empirical results on heterogeneous tabular data of up to 10 million cells, such as hospital cost and quality measures, voting records, unemployment rates, gene expression measurements, and images of handwritten digits. CrossCat infers structure that is consistent with accepted findings and common-sense knowledge in multiple domains and yields predictive accuracy competitive with generative, discriminative, and model-free alternatives.
- North America > United States > Texas > Hidalgo County > McAllen (0.14)
- Asia > Middle East > Jordan (0.05)
- North America > United States > New York (0.04)
- (12 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)